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How LLMs + vector search 
таке АІ tractable for everyone 


Old school Al 


Al with LLMs 


Cassandra's unified compaction strategy, introduced in version 4.0, aims to simplify the 
compaction process and improve overall database performance. It combines several 
compaction strategies into a single, more flexible approach. Here's a brief summary of 


Cassandra's unified compaction strategy: 


1. Background: 
* In earlier versions of Cassandra, multiple compaction strategies existed, each with its 
own trade-offs and configuration parameters. 
* Unified compaction was introduced to streamline compaction and make it easier to 
manage. 
2. Single Compaction Strategy: 
* Unified compaction merges multiple compaction strategies into a single, unified 
approach. 
* It combines Size-Tiered, Time Window, and Leveled compaction into a more versatile 
strategy. 
3. Dynamic Configuration: 
* Unified compaction allows for dynamic configuration based on workload and 
performance requirements. 
* Users can adjust compaction parameters such as the min and max compaction 


throughput, compaction window size, and others to fine-tune performance. 


Cassandra's Unified Compaction Strategy (UCS) is a new approach іо compaction іп 


Cassandra that combines aspects of both tiered and leveled compaction strategies. It 
introduces sharding, adaptability, and aims to optimize read and write performance based 


on user-defined parameters. Here's a summary of how UCS works: 


1. Generalization of Compaction: UCS generalizes tiered and leveled compaction strategies, 
treating them as the same by observing that both create levels based on the size of sstables 
and trigger compactions when a certain number of sstables are present on one level. 

2. Density-Based Levels: UCS categorizes sstables into levels based on the logarithm of 
sstable density, with a user-defined fanout factor ($f$) as the base of the logarithm. Each 
level triggers a compaction when it contains a specified number ($t$) of overlapping 
sstables. The values of $f$ and $t$ determine the strategy's behavior, allowing users to 
choose between leveled and tiered compaction. 

3. Sharding: UCS splits sstables at specific shard boundaries based on the density of an 
sstable. These shard boundaries increase with the density of the sstable, enabling 
concurrent compactions. 

4, Size-Based Levels: If we ignore density and splitting, UCS groups sstables into levels based 
on their size relative to а memtable flush size. This allows UCS to determine the level of an 


sstable based on its size compared to a predefined flush size. 


Al with LLMs + context 
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Finding the right context 


Embedding turns апу text into vectors 


Embeddings capture semantics 


e “Man bites dog” vs “dog bites man” 


e “Tourism numbers are collapsing” vs “Travel industry fears Covid-19 crisis will 
Cause more companies to enter bankruptcy” 


e "I need a new phone” vs “My old device is broken” 


Vector similarity 


Vector search 101 
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As seen іп: 

e Lucene 
е Elastic, Solo, OpenSearch 
е Mongo? 

* Weaviate 

e Qdrant 

е Astra (June 2023) 


ЭАІ (Storage Attached Indexes) 


SSTable lifecycle 


55130 סו‎ SSTable 


Global ANN Composes with Composes with other 


everywhere partitioning SAI indexes 
SELECT * FROM demo SELECT * FROM demo SELECT * FROM demo 
ORDER BY WHERE partition id = ? WHERE (сі = ? AND с2 = ?) 
embedding ANN OF ? ORDER BY OR ₪3 =? 
LIMIT 10 embedding ANN ОЕ ? ORDER BY 
LIMIT 100 embedding ANN ОЕ ? 


LIMIT 5 


Executing ANN queries 


1. Apply SAI predicates across all stables 


2. Query each ANN index for top K relevant rowIDs 
(or brute force it) 


3. Merge by score 
4. Reorder by primary key 


5. Send results to coordinator 


HNSW in production 


100М Wikipedia vectors 


jdk.internal.misc.Unsafe.copySwapMemory0 

jdk.internal.misc.Unsafe.copySwapMemory 
jdk.internal.misc.ScopedMemoryAccess.copySwapMemoryinternal 
jdk.internal.misc.ScopedMemoryAccess.copySwapMemory 

java.nio.FloatBuffer.getArray 

java.nio.FloatBuffer.get 

java.nio.FloatBuffer.get 

org.apache.cassandra.io.util.RandomAccessReader.readFloatsAt 
org.apache.cassandra.index.sai.disk.hnsw.OnDiskVectors.readVector 
org.apache.cassandra.index.sai.disk.hnsw.OnDiskVectors.vectorValue 
org.apache.cassandra.index.sai.disk.hnsw.CassandraOnDiskHnswSVectorsWithCache.vectorValue 
org.apache.cassandra.index.sai.disk.hnsw.CassandraOnDiskHnswSVectorsWithCache.vectorValue 
El org.apache.lucene.util.hnsw.HnswGraphSearcher.compare 
org.apache.lucene.util.hnsw.HnswGraphSearcher.search 
org.apache.cassandra.index.sai.plan.StorageAttachedindexSearcher$$Lambda$2147.0x00000008017de540.get 
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Current SOTA 


Disk :א א‎ Fast Accurate Billion-point Nearest 
Neighbor Search on a Single Node 


SPANN: Highly-efficient Billion-scale Approximate 
Nearest Neighbor Search 
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Abstract 
Abstract The in-memory algorithms for approximate nearest neighbor search (ANNS) have 


Current state-of-the-art approximate nearest neighbor search (АММ5) algorithms 


generate indices that must be stored in main memory for fast high-recall search. 


This makes them expensive and limits the size of the dataset. We present a 
new graph-based indexing and search system called Disk ANN that can index, 
store, and search a billion point database on a single workstation with just 64GB 
RAM and an inexpensive solid-state drive (SSD). Contrary to current wisdom, 
we demonstrate that the SSD-based indices built by DiskANN can meet all three 
desiderata for large-scale ANNS: high-recall, low query latency and high density 
(points indexed per node). On the billion point SIFT1B bigann dataset, Disk ANN 
serves > 5000 queries a second with > 3ms mean latency and 95%-- 1-recallQ1 
on a 16 core machine, where state-of-the-art billion-point ANNS algorithms with 
similar memory footprint like FAISS [18] and IVFOADC+G+P [8] plateau at 
around 50% 1-recall@1. Alternately, in the high recall regime, DiskANN сап 


achieved great success for fast high-recall search, but are extremely expensive 
when handling very large scale database. Thus, there is an increasing request for 
the hybrid ANNS solutions with small memory and inexpensive solid-state drive 
(SSD). In this paper, we present a simple but efficient memory-disk hybrid indexing 
and search system, named SPANN, that follows the inverted index methodology. It 
stores the centroid points of the posting lists in the memory and the large posting 
lists in the disk. We guarantee both disk-access efficiency (low latency) and 
high recall by effectively reducing the disk-access number and retrieving high- 
quality posting lists. In the index-building stage, we adopt a hierarchical balanced 
clustering algorithm to balance the length of posting lists and augment the posting 
list by adding the points in the closure of the corresponding clusters. In the search 
stage, we use a query-aware scheme to dynamically prune the access of unnecessary 
posting lists. Experiment results demonstrate that SPANN is 2x faster than the 
state-of-the-art ANNS solution DiskANN to reach the same recall quality 90% 
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Product quantization 


Original vector 


Decompose |) 
into M d'-dimensional | + j| d' * Ld Ji d. 
subcomponents 
Replaced with id of Hm 


nearest centroid 


JVector 


ч jvector Public 


Р main + P 12 branches 


9» ЧаКе Merge pull request #113 from jbellis/mt-index-build-fixes 
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.github/workflows 
‚туп 
jvector-base 
jvector-examples 
jvector-multirelease 
jvector-tests 
jvector-twenty 
siftsmall 
.gitignore 
LICENSE.txt 
NOTICE.txt 
README.md 


mvnw 


O 6 tags 


57 Pin 


<) 287 commits 


Fix JDK11 GitHub workflow 

Add mvn wrapper 

nits 

fix generic <> 

Clean up all build warnings related to multimodule versioning. 
Merge pull request #113 from jbellis/mt-index-build-fixes 
Switch to Jctools NBHML and FieldUpdater 


SiftSmall example 


Add missing build elements for deploy. Reorganize poms to produc... 


import hnsw + pq 
import hnsw + pq 
Merge pull request #113 from jbellis/mt-index-build-fixes 


Add mvn wrapper 
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About 


JVector: the most advanced 
embedded vector search engine 


java search-engine machine-learning 
ann knn similarity-search 


vector-search 


Readme 
Арасһе-2.0 license 
Activity 

735 stars 


20 watching 


OR га Н 


41 forks 


Releases 


O 6 tags 


Create a new release 


Packages 


ОР5 חס‎ Deep100M dataset (24GB Macbook) 


77 Lucene В JVector 
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Closing thoughts 


The fine print 


(things that don't work yet) 


е Paging 
e CL > 1 
е זו 'זהח]...‎ 


Lessons learned 


е Not correct to apply indexes to SSTables separately 

е OpenAl embeddings are ridiculously overparmeterizehd 
е Use e5-v2 

Ф Kmeans++ doesn't need а lot of iterations 

е Vamana needs room to add edges 


е Surprisingly hard to predict how many nodes will be visited when searching 
for the top К of В items in a graph of N 


f (nodes visited) 


Graph for All Original Datasets with Re-fitted Parameters and Best Fit Coefficient for Filtered Data (f < 100000) 


Datasets 
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Where's the code? 


e JVector: https://github.com/jbellis/jvector/ 


е Cassandra + vector search (today): 
https://github.com/datastax/cassandra/tree/vsearch 


» Cassandra + vector search (RSN): 
https://issues.apache.org/jira/browse/CASSANDRA-18557 


